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(57) Abstract: This relates to routing and caching systems for re- 
ducing the bandwidth used by decentralised peer-to-peer (P2P) file 
sharing networks. A method of reducing traffic in a decentralised 
peer-to-peer network is described. The peer-to-peer network operates 
over an underlying network comprising first and second network por- 
tions. The method comprises routing a peer-to-peer message in one of 
said network portions with an intended destination in the other of said 
network portions to a gateway between peer-to-peer nodes residing 
on said first and second network portions; and controlling transport 
of said message at said gateway to limit propagation of said message 
into said other of said network portions. 
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METHODS AND APPARATUS FOR TRAFFIC MANAGEMENT IN PEER-TO- 

PEER NETWORKS 

This invention is generally concerned with apparatus, methods and computer program 
codes for managing and reducing traffic in peer-to-peer networks. More particularly the 
invention relates to routing and caching systems for reducing bandwidth used by 
decentralised peer-to-peer (P2P) file sharing networics. 

Generally a peer-to-peer network comprises a plurality of nodes where each node 
represents a computer running a compatible peer-to-peer (P2P) client. Generally a node 
will operate as both a server and a client, and it may then be referred to as a servent 
(SERVer plus cliENT). Each node has a number of peers that it maintains direct 
connections to, the nodes or peers interacting as equals, A peer-to-peer network is 
sometimes referred to as an overlay network since it overlays an xmderlying physical 
network topology and may operate at a relatively high network layer, such as ttie 
^plication layer. Generally the topology of a P2P network bears little or no 
resemblance to the underlying physical network topology, as will be explained further 
below. Most often P2P networks are implemented over the Internet, although they may 
dlso may be implemented on other networks such as intranets and extranets. 

Peer-to-peer networking has a variety of applications including distributed processing 
and file sharing. In file sharing P2P systems users or nodes may exchange data files or 
fragments of data files without the need for the files (or fragments) to be stored on a 
central server. This contrasts with worldwide web protocols in which web clients 
download data held on web servers which are named, and specified by a fixed intemet 
protocol (IP) address. By contrast in a P2P network a data file may be found one, or 
more typically a plurality of nodes the addresses of which need not be fixed and may 
change over time as nodes add or remove files from their local storage. (It will be 
appreciated, however, that the IP address of any particular computer at a node may 
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remain fixed). It will be recognised that the structure of P2P networks imposes special 
problems when searching for and downloading files. 

Peer-to-peer networks fall into two broad categories, centralised P2P networks 100 and 
decentralised P2P networks. Figure la illustrates an example of a c«itralised P2P 
network and figure lb illustrates an example of a decentralised P2P network 1 10. The 
centralised P2P network has one or more central servers 102 to which each user node 
104 must connect. Central server 102 maintains a centralised index or directory of 
shared files, user nodes or peers 104 unloading lists of files which they are prepared to 
make available to this central index. This central index may be queried by a node 104 
to locate a file, following which the file may be directly retrieved fi:om a node which the 
central index shows has the file available. In a centralised P2P network a user must 
register with the central server 102, which has a fixed, known address. Thus in a 
centralised P2P network there is a restricted number of points of entty into the network, 
defined by these central servers. In P2P system 100 central server 102 maintains a list 
of users who are logged onto the network, lists of files shared at any given moment by a 
user, and IP addresses of clients logged onto the central server to search for shared files. 
A search result generally comprises a list of user names and ff addresses and of 
corresponding file names found by the central server 102. 

By contrast in the decentralised P2P network 1 10 of figure lb a user may join the 
network at any point and there is no requirement for registration with a single central 
server or v^th one of a few particular, designated central servers. Thus in figure lb the 
user nodes 1 12, or peers, act as both clients and servers at the same time (servents) to 
allow a request for a resource such as a file to propagate through the network until a 
node or peer at which has the resource is found. 

More sophisticated P2P networks may include "supemodes" or '\iltrapeers" each with a 
plurality of "leaf' nodes; sometimes these are referred to as controlled decentralised 
P2P networks. Such supemodes may index content available firom nodes in a nearby 
neigihbourhood, such as content available fi-om the leaf nodes, as well as communicating 
among themselves, for example to satisfy queries. Supemodes may be dynamically 
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selected, for example by defining an initial set of supemodes and then allowing nodes 
(peers) to vote amongst themselves to elect more, hi terms of the searching process this 
is analogous to identifying well-connected people to ask first about the availability of a 
file or other resource. 

The topology of a decentralised P2P network is not generally structured to be efficient 
in respect to the underlying physical network(s). File search requests or queries are 
generally broadcast fi-om each node to all its neighbouring nodes, and query responses 
are sent back along the route defined by the chain of queried nodes. In both 
decentralised and centralised networks, however, once a file has been located a direct 
connection such as a socket connection (address and port number) is generally 
established between the node at which the file is available and the requesting node. A 
node may also initiate a plurality of connections to other nodes or peers to attempt a 
plurality of concurrent downloads, for example to use the fastest, or a node may 
download multiple fi-agments of a file, for example in parallel firom multiple peer nodes. 

Thus the traflSc on P2P file sharing networks may be categorised into network traffic 
and download traffic, the network traffic comprising, for example, messages involved in 
maintaining the P2P network infi-astructure together with search requests and the 
responses they generate. It will be appreciated fi-om the foregoing that decentralised 
P2P networks are able to generate significant quantities of network traffic. Furthermore 
since many users of file sharing P2P networks use the network to share multimedia data 
downloaded file sizes and the download traffic is also usually large. Currently P2P file 
sharing networks are responsible for a significant proportion of data traffic on the 
Internet, and in particular on the networks of Intemet Service Providers (ISPs) who 
provide intemet connectivity to consumers over broadband coimections. Typically P2P 
users generate around 50% of this traffic. There is a considerable cost associated with 
routing and relaying this information, which is one problem which the present invention 
addresses. 

Referring to figure Ic, this shows six steps 120-130 illustrating the location and 
retrieval of a file m a decentraUsed P2P network. In this decentralised P2P network 
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every P2P user is a node on the network and each node maintains a connection to a 
small number of other nodes, in this example two but more typically four. At step 120 
node A issues a query 132 to nodes B and C which propagates at step 122 to nodes D, 
E» F and G. At step 124 node E sends a response or query hit 134 back to node B and 
thence, at step 126, back to node A. At step 128 node A then issues a file download for 
Get request 136 directly to node B which responds, at step 130, by sending the 
requested file 138 to node A. 

As each node selects its peer nodes at random the '"neighbouring" nodes of a peer may 
physically be on the other side of the world. This is illustrated in table 1 below, which 
shows the direct peers that a Gnutella P2P client running in Cambridge, England was 
maintaining at the time of the measurements. 



Peer Address 



Region 



Country 




66J6.33J04 Tyler . . Texas / , USA ' 

S0.13S.77.2S1 ■ Dortmund': 'j: .Hordrh^iri-Westty-:- " ' Germdhy ■ 

2:13.66.224.126 (unknown) . (unknown) , i lj;.. . , Sw^^ 

66.ii4.142.2i4, .Glendgler Califomiq:..:;^:. ; ; U$A^ ■ 



Table 1 
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This illustrates one of the main problems of P2P networks, which is that such 
application level and other P2P networks do not take into account geographical 
locations for usage costs associated with the underlying physical networks. A result of 
this is that each node in an ISP's network typically has a number of peers jGrom all over 
the world and is constantly relaying large numbers of queries, and the corresponding 
query results. 

A P2P query has an associated time to live (TTL) which dejBnes the number of network 
hops the query is permitted to make. At each hop from one network node to another the 
TTL counter is decremented until it reaches zero, when the query is no longer 
propagated. A typical initial value of the TTL counter is T=7 and P2P peers generally 
refuse packets with a TTL value greater than a permitted maximum to restrict the use of 
network resources. However it can be seen that if each node maintains coimections to P 
other nodes the total number of computers ^oug^ which a query propagates is 
approximately determined by T^ and for say, P=4 and T=7, T^=16384. It can therefore 
be seen that P2P networks generate large quantities of network traffic. 



Currently decentralised P2P systems include Gnutella (with applications such as 
Morpheus, Bearshare and Limewire), FastTrack and KaZaA, the latest versions of 
which use the above described supemode concept; an example of a centralised P2P 
network is Napster, There are many uses for such P2P networks including the 
distribution of free software such as Linux code, the distribution of games, and instant 
messaging. Furthermore because the originator of a query or message is kept secret P2P 
networks can be usefiil in countries where freedom of speech is limited. However 
research indicates that at least in some networks the bulk of P2P traffic comprises audio 
and video data files such as MP3 and AVI files (see, for example, study at the 
University of Washington, "An Analysis of Internet Content Delivery", Stefan Saroiu, 
Krishna P. Gummadi, Richard J. Dunn, Steven D. Gribble, and Henry M. Levy, 
Department of Computer Science Engineering, Washington, Seattle, WA 98195-2350, 
USA, in proceedings of the 5^ Symposium on Operating Systems Design and 
Implementation, Boston, MA, December 2002). Such audio and video data may 
include previews of audio tracks, film trailers and the like. It has been found {ibid) that 
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the median object size for P2P networks is approximately 4MB, that is approximately 
1000 times the average web document size (2KB) and that the average bandwidth 
consumption of a P2P peer is £9)proximately 90 times that of a web client 

The University of Washington study (ibid) also explored the benefits to be obtained 
firom a theoretical cache for KaZaA P2P traffic, finding that P2P caching should 
potentially be highly beneficial. However no indication was provided as to how such 
caching may be implemented in practice smce, as will be s^preciated from figure lb, 
the distributed nature of a decentralised peer-to-peer network would seem to make 
caching impossible. Indeed tiie paper by Sariou et al., explicitly stated that "our goal is 
not to solve (or even identify) all of the complexities of P2P caching but rathor to gain 
insight into how important a role caching may play**. 

The caching of web content is an established technology that is widely used. The first 
web caching software was probably the CBRN web proxy (developed at the CERN 
laboratories in Switzerland by Tim Bemes-Lee). Version 2.16 (dated February 1994) 
contained the core caching functionality. Several other companies (Inktomi, Network 
Appliance & CacheFlow) developed and sold web caching products, and the freely 
available Squid web cache is widely used. 

With a web proxy cache, web requests (which use the Hyper-Text Transport Protocol or 
HTTP) get sent via a proxy rather than going directly to the server that hosts the 
content. This proxy can store (or "cache") the data it downloads so that if another user 
requests the same web page that the data can be sent from its local copy. Web proxy 
caches can reduce internet bandwidth usage and "speed up*' access to the web. 

Figures 2a and 2b illustrate the principle of a web cache, in which a web server 200 
communicates over the Internet 202 with a computer 204. In figure 2a computer 204 
talks directly the remote web server 200 but in figure 2b computer 204 first 
communicates with a local proxy server 206 which may send back copies of the 
requested data that it has cached locally. 
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The distributed nature of a P2P network has led to the belief that the efBcient caching in 
a P2P system is not possible. Traffic management within a P2P network has therefore 
focused on a relatively crude technique as limiting the time to live (TTL) of a P2P 
packet, although, one company, SandVine Inc of Waterloo, Ontario, Canada, offers a 
product (the PPE8200) which purports to lower network costs for peer-to-peer traffic by 
redirecting P2P downloads to hosts on a lower cost path for attempting hosts on a higher 
cost path. 

According to a first aspect of the present invention there is therefore provided a method 
of reducing traffic in a decentralised peer-to-peer netwoik, said peer-to-peer network 
operating over an underlying network comprising first and second network portions, the 
method comprising routing a peer-to-peer message in one of said netwoik portions with 
an intended destination in the other of said network portions to a gateway between peer- 
to-peer nodes residing on said first and second network portions; and controlling 
transport of said message at said gateway to limit propagation of said message into said 
other of said network portions. 

The gateway resides at the boundary of a service providers network to facilitate control 
of traffic &om the one network portion that is, for example, the first network portion 
managed by, say, an ISP and the other network portion, in this example the second 
network portion, comprising, say, the rest of the world. By routing peer-to-peer 
messages, preferably all peer-to-peer messages intended for the other network portion, 
via the gateway a range of transport control and traffic reduction functions become 
possible. For example propagation of a message into the second network portion may 
be prevented either by ignoring the message or redirecting the message or, as described 
fiirther below, responding to the message from a cache. The message controlled at the 
gateway may comprise a query message or additionally or alternatively propagation of a 
request for a resource made in response to the result of a query message (for example, in 
response to a query hit) may be limited, for example by controlUng a subsequent file 
download or GET request. The routing may be performed by configuring a router to 
transparently route all outgomg and potentially also all incoming P2P traffic via the 
gateway. Thus the first and second network portions may comprise, for example. 
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portions of an underlying physical or logical network or domains of an ff -based 
intenet. 

The concept of a gateway node allows the number of peer-to-peer connections across a 
boundary between the &st and second network portions to be limited, thus controlling 
the bandwidth across the boundary and hence also facilitating control of data transport 
costs. Furthermore since in preferred embodiments all P2P trafiSc is routed via the 
gateway node this node may act as a cache (for inbound and/or outbound traffic) and/or 
router and/or bandwidth throttle and/or content filter. Thus, for example, unsuitable or 
illegal content may be blocked at the gateway or messages may be blocked to limit or 
throttle the traffic between the first and second network portions. A message may also 
be redirected to a peer-to-peer node known to the gateway within its network portion. 
Additionally or alternatively the gateway may respond to ttie message. Preferably, 
therefore, the gateway comprises an active peer-to-peer node of the peer-to-peer 
network. 

A query hit or search result message passing through the gateway may be rewritten en 
route so that the requesting node requests a file fi-om a node different fi'om that which 
responded positively to the search request. Such rewriting may comprise rewriting an 
index to a file within the search result or query hit or additionally or alternatively the 
response may be rewritten to disguise the address to which the response originated or to 
substitute an address, for example, an address of the gateway node, for an address in the 
response. In the preferred embodiment the gateway comprises a cache for caching 
responses to queries and/or files requested in file requests. In this way search and/or 
download traffic may be reduced; preferably both search and download data is cached. 
It will be ^predated that the concept of a gateway node facihtates such caching. 
Preferably data is cached with a time stamp to allow the data to be discarded after 
expiry of a period of time. The skilled person will appreciate that smce in some P2P 
networks downloading of partial files or firagments is permitted, a file cache may store 
such file firagments and these may then be retrieved and served to a requesting node in 
accordance with the P2P protocol. 
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Caching in a P2P network presents particular problems associated with the very large 
volume of data transferred. Monitoring P2P networks suggests that the volume of file 
data available for downloading is currently of the order of 5 Petabytes (5x10^^ bytes). 
A general rule of thumb estimate is that to be effective a cache should be at least 1/100^^ 
of the data transferred, in this case of ftie order of 50TB, and preferably more. Thus 
caching at this scale would appear to be impractical. However, the inventor has 
recognised that much of the traflBc on a P2P network comprises multiple replications of 
the same files, although not necessarily with the same file name. Thus, for example, 
many nodes may store a Star Wars (Trademark) trailer, with different users adopting 
different names for the trailer although storing an identical file. This recognition has led 
to the concept of associating a content identifier such as a hash fimction with each file 
and/or file requests cached. As the skilled person will know, a hash value represents 
concisely the longer message or file from which it was confuted, and is thus able to 
serve an identifier of content irrespective of file name. Thus in preferred embodiment 
such a hash fimction, sometimes known as a message digest or checksum is used as an 
index into the cache. Where the cache stores search responses or query hits, these 
preferably include a hash value to facihtate their recall, and where a query hit is 
rewritten, preferably it is rewritten with such a hash value. Likewise where file data is 
cached this is preferably cached in association with a hash value to facilitate its 
identification irrespective of a requested filename. Such a hash value may also be used 
to retrieve additional file or data source-related information firom a cache. In this way, 
duplicate files, that is files with the same contents but, for example, with different 
names and/or firom different networks that these need only be stored once in the cache. 
Identification of duplicate files may be perfomied such that duplicate files are not stored 
or additionally or alternatively duplicate files may be weeded firom the cache at 
intervals. 

In some P2P networks messages have a "unique" identifier and in such networks the 
gateway may operate to limit propagation of a message between the first and second 
network portion, each message is only ever relayed once across an ISP boundary. 
Tracking P2P messages in this way also facihtates selective routing of messages based 
upon the cost of the underlying physical network, and similarly facilitates satisfying 
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download requests the cost of data transport (monetary, bandwidth or other) over any 
network portion. 

In a related aspect the invention provides a computer network message controller for 
reducing trafBc in a decentralised peer-to-peer network, said peer-to-peer netwoik 
operating over a physical network comprising first and second network portions, said 
network message controller comprising a router for routing a peer-to-peer message in 
one of said first network portions with an intended destination in the other of said 
network portions to a gateway between peer-to-pe^ nodes residing on said first and 
second network portions; and a gateway controller configured to control transport of 
said message uito said other of said network portions 

In a further related aspect the mvention provides A gateway controller, m particular for 
the computer network message controller of claim 18, for reducing trafiSc m a 
decentralised peer-to-peer network operating over an underlying network coniprising 
first and second network portions, the controller being configured for operation at a 
gateway between peer-to-peer nodes residing on said first and second network portions, 
the gateway controller comprising an interface for said first and second network 
portions, for receiving a peer-to-peer message in one of said first network portions with 
an intended destination in the other of said network portions; and a controller 
configured to control transport of said message into said other of said network portions. 

The invention also provides a peer-to-peer network cache comprising a network 
interface for interfacing to a network over which said peer-to-peer network operates; a 
data store for storing cached data files each in association with a data file identifier, a 
data file identifier comprising a value computed fi-om the contents of a data file it 
identifies program memory storing processor control code; and a processor coupled to 
said network interface, to said data store, and to said program memory for implementing 
said processor control code, said code comprising code for controlling the processor to 
read peer-to-peer traffic on said networic; identify a request for a data file within said 
peer-to-peer traffic; identify the requested data file firom a said data file identifier within 
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said peer-to-peer traffic; and provide said requested data file from said data store to a 
peer-to-peer node making said request. 

Embodiments of the peer-to-peer network cache provide as described above for P2P 
caching, particularly in decentralised P2P networks, hi a third embodiment the P2P 
network cache comprises an active node on the P2P network. The data store for storing 
the cache data files may be provided with commonly downloaded or large audio or 
video data files, for example manually, but preferably data within the data store is 
obtamed firom the P2P network. As previously mentioned a data file identifier 
preferably comprises a hash value so that identical files with different file names have 
the same hash value. Preferably a collision resistant-fimction is employed such as 
MD2, MD5 or SHAl (see below). 

The use of a data file identifier comprising of a value computed fi:om the contents of a 
data file it identifies facilitates caching in a P2P network when tiiis might otherwise 
considered infeasible due to the large volume of data to be stored. 

The request for a data file identified by the processor may comprise a Query (or search 
request) or a GET (or file request) message. The data file identifier by which the 
requested data file may be identified may either comprise a data file identifier such as a 
hash value in a query hit message sent by a machine in response to a query or the data 
file identifier may be included within the GET (or file request) conmaand. The 
identified, requested data file may then be provided by the hash. 

In some embodiments the code may fijrther comprise code to identify and rewrite a 
response to the request, for example to rewrite an index in the response so that it refers 
to the data store (that is the cache) so that when the file is requested it is the cache that is 
indexed. Where the peer-to-peer network cache operates at a gateway as described 
above (preferable, but not essential) there may be no need to rewrite an IP address 
within the response so that it points to the cache since being located at the gateway it is 
able to intercept a file download request prior to its reaching its intended destination. 
However in other circumstances, for example where P2P traffic is not routed througji 
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the cache, a source identifier within the response (IP address and/or port number) may 
also be rewritten. 

Where a requested data file is not yet stored in the cache the file may be downloaded 
and stored in the cache and also relayed to the requester. Where the results of previous 
file search requests (query hits) have been stored in the cache these may indicate one or 
more sources for the requested file; alternatively a source may be obtained fix>m a 
source specified by the requester in the download request A hash value for the file 
may either be retrieved with the file or calculated once the file has been downloaded 
into the cache. Alternatively file data for the cache may be obtained by snooping the 
passing traffic where a previously cached query hit included such a value. 

In a preferred embodiment the code includes code to identify duplicate data files and 
limit the number of such files, for example to a single copy of each identical data files, 
even where such files have different names. 

In a corresponding aspect the inventions provides a method of reducing traffic in a 
distributed peer-to-peer network, the method comprising monitoring peer-to-peer traffic 
of said network; identifying a request for a data file within said peer-to-peer traffic; 
identifying the requested data file from a data file identifier associated with said request 
within said peer-to-peer traffic; and providing said requested data file from a cache to a 
peer-to-peer node making said request. 

Again the request for a data file may be a query and an associated query hit response 
may include the data file identifier, or the request itself may include the identifier. 
Preferably the peer-to-peer traffic monitoring takes place at a gateway between two 
different physical network portions or domains, for example between physical network 
portions distinguished by different most significant bits of their IP addresses. 

In a fiuiher aspect the invention provides a peer-to-peor network cache for modifying 
peer-to-peer network traffic, the network cache comprising a network interface for 
interfacing to a network over which said peer-to-peer network operates; a data store for 
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storing digital fingerprint data for identifying data jSles, and corresponding data file 
source identifiers; and a program memory storing processor control code; and a 
processor coupled to said network interface, to said data store and to said program 
memory for implementing said processor control code, said code comprising code for 
controlling the processor to read peer-to-peer traffic on said networks; identify a 
response to a peer-to-peer file request within said peer-to-peer traffic, said response 
including a digital fingerprint of a requested file; identify firom said data store an 
alternative source for said requested file to a source of said response; and identify a 
response to a peer-to-peer file request within said peer-to-pe^ traffic, said response 
including a digital fingerprint of a requested file; identify firom said data store an 
alternative source for said requested file to a source of said response; and reply to said 
file request using said alternative source. 

Caching data file source identifier together with corresponding digital fingerprint data 
for identifying the data files, such as hash value data, facilitates modifying (such as 
redirecting) and reducing search requests/response traffic in a P2P network, particularly 
a decentralised P2P network. The P2P file request may either comprise a query for a 
download request If the request comprises a query the reply may reply a query hit 
comprising a cache index for retrieving a cache file; if the file request comprises a GET 
or download request the reply may comprise the requested file, for example served fi-om 
a file cache or obtained firom another source as identified by a source identifier stored in 
the data store. Thus the reply may involve downloading a file fi-om the alternate source 
and then forwarding the downloaded file to the requester. If the file request comprises a 
query and a source for the requested file is in the cache the reply may comprise a 
rewritten query hit. If the file request comprises a download request and the requested 
file is not cached the requested file may be obtained firom the alternative source to allow 
a response to the request. 

In a preferred embodiment the processor control code further comprises code to 
populate the cache, for example by monitoring query and query hit traffic for storing in 
the data store. Where a requested file is cached, source identification data 
accompanying the digital fingerprint data of the file may simply comprise a flag to 
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indicate that the file is stored in the cache. As previously mentioned where data file 
data is cached this may comprise complete or partial file data. 

In a further aspect the invention provides a method of modifying peer-to-peer network 
trafSc in a distributed peer-to-peer network^ the method comprising reading peer-to- 
peer traffic on said network; identifying a response to a pe^-to-peer file request within 
said pe^-to-peer traffic, said response including a digital fingerprint of a requested file; 
identifying fiom a cache an alternative source for said requested file to a source of said 
response; and replying to said file request using said alternative source. 

Embodiments of the above described methods may be implemented using computer 
program code, for example on a general purpose or dedicated computer system. Such 
program code, and the above described processor control code, may be provided on any 
conventional carrier medium, such as a hard or floppy disk, ROM or CD-ROM, or on 
an optical or electrical signal carrier for example via a communications network. The 
code may comprise code in any conventional programming language, for example C, or 
assembler or machine code. Such code may be distributed over a plurality of coupled 
components, for example over a network, as is well known to those skilled in the art. 
Likewise data processing and storage functions may be separated but linked, for 
example by a network. Embodiments of the invention may be implemented on a variety 
of communications network, but are particularly suited to implementation in a P2P 
network running over an IP-type protocol. 

/ 

These and other aspects of the present invention will now be further described by way 
of example only with reference to the accompanying figures in which: 

Figure la, lb and Ic show, respectively, an example of a centralised peer-to-peer 
network, an example of a decentralised peer-to-peer network, and steps illustrating 
location and retrieval of a file in a decentralised peer-to-peer network; 

Figures 2a and 2b show a computer in contact with a remote web server directly and via 
a proxy cache respectiveljr, 
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Figures 3a to 3f show, respectively, a TCP/IP data packet, a P2P message header, a P2P 
pong message, a P2P query message, a P2P query hit message, and a P2P Get message; 

Figures 4a to 4c show, respectively, a P2P network, a P2P network including a gateway 
node, and an implementation of a gateway node; 

Figure 5 shows an example of an intemet service provider network including a P2P 
gateway; 

Figures 6a and 6b show, respectively, an embodiment of the gateway node, and tables 
of a data store for the node of figure 6a; 

Figure 7 shows messages in a P2P network including a gateway node; 

Figure 8 shows processing of a P2P query/query hit at a gateway node; 

Figure 9 shows processing of a P2P download request at a gateway node. 

It is helpful for understanding the invention to provide some background information on 
an example of a P2P protocol, as described below the Gnutella protocol, further details 
of which can be found at http:/ /www.rfc-gnutella.sourceforge.net/ . 

Figure 3a shows a conventional TCP/IP (transmission control/internet protocol) data 
packet 300 comprising an IP header 102 (including IP source and destination 
addresses), a TCP header 304 (including, among other data, source and destination port 
numbers) and payload data 306. To connect to a P2P network a Gnutella node 
operating as a client establishes a TCP connection with a server (servent) and sends a 
connection request which the server acknowledges. Once a connection is established 
the two P2P nodes then communicate wifli each other by exchanging messages, 
sometimes called (protocol) descriptors. These P2P messages or descriptors each have 
aheader 310 as shown in figure 3b comprising a node/message identifier 312, a 
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message function 314, a time to live (TTL) counter 316, ahop count 318 and apayload 
length field 319. The node/message identifier may comprise a combination of a node 
identifier and a coimter value to create a unique message identifier which can be used, 
for example, to prevent propagation of duplicate messages arising fi:om loops in the 
system topology. The message function 314 specifies the message type (Ping, Pong and 
the like); the hop counter 318 is incremented at each network hop fi-om one peer to 
another and the TTL counter 3 1 6 is correspondingly decremented by one; the payload 
length assigns the lengtii of the accompanying P2P message. 

In the Gnutella protocol a Ping message (of zero payload length, not shown in figure 3) 
is used for probing the network and the response by Ping message is a Pong message 
320 as shown in figure 3c. A Pong message includes a port number 322 and an IP 
address 324 (those are identified by their IP addresses), together with a field 326 
specifying the number of files shared by the node, and a field 328 specifying the 
number of kilobytes of data shared by the node. Other messages include a query 
message 330 as shown in figure 3d comprising a minimum required speed field 332 and 
a field 334 carrying search data The response to a query is a query hit message 340 as 
shown in figure 3e. A query hit message comprises a number of hits field 342, port and 
ff address fields 344, 346, a (download) speed field 348, result data 350 and a node or 
servent identifier 352 (such as an IP address). The result data comprises a set of results 
each having a format 354 comprising a file index 356, a file hash value 358, a file size 
360 and a file name 362. The file index 356 is a number which can be used to index a 
file shared by a node; the hash value 358 is optional. Figure 3f shows a so-called push 
message 370 comprising a node identifier 372, a file index 374, a file hash value 376 
(optional) and IP address 378, and a port number 380. (Push messages are used to 
request a host behind a firewall to make an outgoing upload connection to a requesting 
node). 

The Gnutella protocol also includes a set of rules for propagating messages. Thus for 
Ping and Query messages a node propagating a message to all the nodes to which it is 
directly connected except the originator of the message, that is ping and query messages 
are broadcast to all neighbours. Pong, query hit and push messages are sent back along 
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the same path as that which carried the initial associated message (ping or query). A 
node does not propagate a message with an identical identifier to a message it has 
previously received. When a message is propagated a node decrements TTL field and 
increments the hop count. 



Once a file has been located it is downloaded using the HTTP (Hyper Text Transfer 
Protocol) GET command. Thus a query hit includes an IP address and port as well as 
an index and file size and file name the GET command can be used to request the 
selected file, indicating fiie index and file name and a http version number such as 
version 1.0. 

An example of a P2P session is given below: 

Client: QUERY 'Madonna American Pie" 

Server: QUERYHIT IP Address 

MADONNA - AMERICANPIE.MP3, <INDEX>,<HASH> 

Client: GET /get/<INDEX>/ MADONNA ~ AMERICANPIE.MP3 HTTP/1.0 

Figures 4a and 4b illustrate the concept of a gateway node. Figure 4a shows a peer-to- 
peer network with a plurality of nodes 400 some of which are within a network 402 
managed by a service provider, and others of which are located elsewhere in the Intemet 
404. It can be seen that nodes within the ISP's network each have a number of peers in 
the outside world which results in a large number of connections (shown as lines in 
figure 4a) across the boundary 406 of the ISP's network. Since every packet which 
crosses boimdary 406 costs the ISP money this results in high network costs. Moreover, 
the large number of connections lead to an untidy topology which is difficult for the ISP 
to manage. 

The effect of introducing a gateway node 408 comprising an active node on the network 
which acts as a gatekeeper at the edge of the ISP's network 402. All P2P requests are 
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routed through gateway node 408 and attempts to connect to nodes outside the ISP's 
network results in connections to the gateway node. This facilitates a high level of 
interconnectivity within the ISP*s network and a small number of fixed connections, for 
example eight, crossing the expensive ISP boundary 406. Since all traffic is routed 
througji the gateway node 408 this node can form the function of a cache router, a 
bandwidth throttle, and a content filter. The gateway node 408 monitors and routes P2P 
traffic transparently and also stores information relating to P2P queries and the 
associated responses (query hits) in a database. In preferred embodiments queried hits 
are also rewritten so that they refer (directly or indirectly) to indexes of files on the 
gateway node. The destination port number and internet address of a query hit may also 
be rewritten to point to the gateway node. Since the gateway node stores flie results of 
past searches it may itself respond to search queries based upon these stored results. 
The gateway node 408 may also store contents (files) indexed based upon a key or 
checksum. A file checksum from a query hit packet may be used to access the database 
for a subsequent P2P download. 

Still referring to figures 4a and 4b, consider a network with a number X of hosts on the 
ISP network and a number Y of hosts outside the network, each node having P peers 
which are randomly selected firom all the nodes on the network. Then for each 
connection firom each internal node, the likelihood of a peer being outside of the 
network is: 

Y 

Therefore, the total number of connections crossing the network border will be: 

XPY 
X^Y 



Typical values for an ISP with 10,000 active P2P users would be: X^10,000, 
Y=1,000,000 and P=4. This results in: 
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10,000x4x1,000,000 4jc10^° , ^' 

« — « 4jc10 

10,000 -H,000,000 10' 

The effect of 40,000 pennanently established TCP connections constantly relaying data 
is that large amounts of data gets transferred over the ISP's boundary resulting in high 
network costs. In particular, with the Gnutella network, network traffic represents a 
considerable proportion of the overall P2P traffic, possibly as much as 40% of the 
overall traffic. 

With a gateway node, this is reduced to a small fixed number of connections (typically 
8) independent of the number of users on the network. At a simplistic level this reduces 
the network bandwidth used by a factor of about 5,000, a 99.98% reduction (to 
accurately model the traffic saving an understanding of tiie rules about query routing 
and the prorogation of messages is needed. This is highly specific to the precise P2P 
network and the configuration of the clients, but the overall results achieved are still 
approximately the same.) 

As previously mentioned, as well as reducing the amount of traffic being relayed the 
gateway node also stores data based upon the search traffic and responses that it sees. 
This enables it to build up a database of the locations of files on the network. This 
information may then be used for intelligent routing and caching of subsequent 
download requests. 

Referring now to figure 4c, this illustrates one implementation of a gateway node 408 
comprising, in this example, a P2P caching router. As described fiirther below such a 
P2P caching router may comprise conventional computer hardware coiq)led with 
routing hardware and suitable program codes. In figure 4c a plurality of computers 410, 
typically personal computers are coupled via a network 412 to a router 414, network 
412 and router 414 typically comprising part of an ISP's network infi^tructure. Non 
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P2P traffic from router 414 is routed directly to internet 416 whilst P2P traffic is routed 
to P2P caching router 408 and thence to internet 416. This allows the ISP to reduce the 
level of network traffic on the network managed by the ISP and more particularly to 
reduce amount of "upstream" bandwidth required by the ISP, that is bandwidth to the 
Intemet external to the ISP. The gateway node 408 preferably caches both network and 
download traffic and may cache one or both of inbound and outbound traffic (forward 
and reverse) caching. Router 414 may be configured to recognise P2P traffic based 
upon, for example, the destinations/ports by looking at or snooping packet contents. 

Referring now to figure 5, this shows further details of the arrangement of figure 4c, and 
like elements are indicated by like referencing rules. Thus in figure 5 PCs 410 are 
coupled to an ADSL (Asymmetric Digital Subscriber Line) or cable modem 502 with an 
IP backbone comiection 5 14 to a network 412 managed by ISP 500 but typically 
operating over physical network hardware provided by a telephone or cable company. 
The ISP router 414 and P2P gateway 408 are, in the example shown coupled to a 
common router 506 which coimects the ISP to a backbone or core network 508 again, 
for example, provided by a cable company. Router 506 may separate incoming P2P 
traffic for gateway 408 in a similar way to that in which router 414 handles outgoing 
P2P traffic or, altematively, both incoming and outgoing P2P traffic may be identified 
by router 414 and sent via gateway 408. Backbone 508 may provide a link to other 
portions of the ISP's network, as well as one or more links 512 to the networks of other 
intemet service providers; generally backbone 508 will also include a high bandwidth 
connection into the Intemet 416. 

As previously mentioned, ISP router 414 (and/or router 506) may identify P2P traffic in 
a number of ways. For example the Gnutella protocol comprises http traffic sent to 
ports 6346 and 6347, whilst KaZaa comprises HTTP traffic sent to ports 1214 (although 
version 2.0 of KaZaa selects a random port for incoming P2P connection). The 
destination port of a P2P message may be read from the message (see, for example, 
fields 322 of the Pong message shown in figure 3c). Where this is no fixed port for a 
P2P message a P2P packet may potentially be identified by reading the payload of a 
packet, for example to identify a P2P header or other P2P- protocol format data. 
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Figures 6a and 6b show details of the P2P gateway 408 of the figure 5. Thus the 
gateway may comprise a conventional computer system including a processor 602, a 
working memory 604, permanent program memory 606, a data store 608 and (optional) 
user interface 612 all linked by a common data and control bus 614. The gateway 408 
also includes a data communications card 616 linked to bus 614 to provide physical data 
communications interfaces, packet processing and routing functionality in accordance 
with control exercise by processor 602. In the illustrated example three data 
communications connections are provided, a first by direct communications link 61 8 to 
an **mtemal" ISP network, for example physically provided by a cable company, which 
generally speaking, will not have an associated per byte cost A second bi-directional 
' communications link 620 may also be provided to a second physical netv^ork, for 
example of a second cable company, which may provide a reduced cost packet data 
connectioiL A third bi-directional communications link 622 provides a connection to 
external networks (the **rest of the world"), in particular the hitemet 

Permanent program memory stores operating system code, (optional) user interface 
code, data communications control code for controlling data communications card 616, 
TCP/IP code, P2P protocol code, query/queryhit handling code (described below), and 
download request handUng code (described further below). This code is loaded and 
implemented by processor 602 to provide the corresponding functions for the gateway 
node 408. Some or all of this code may be provided on a carrier medium illustratively 
shown by removal of storage medium 607, such as a CD-ROM. 

Data store 608 stores cached data files and cached query hits. Figure 6b shows file 
cache tables of data store 608 comprising a source table and a cache table. The cache 
tables are indexed by a cache ID, in which, one embodiment, comprises a mmiber 
between 0 and 2^^. Files in data store 608 are indexed by the cache ID, in one 
embodiment a portion of the cache ID comprising a directory identifier and a portion of 
the cache ID comprising the file name (for example /1 23/456/789). Files stored in data 
store 608 may comprise either complete or partial files; in one embodiment data store 
608 comprises approximately 1TB of RAID storage. 
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In the cache table, associated with the cache JD, is a hash value such as a MD5 or SHA 
value, and **LiOuiCache" flag to indicate whether or not the identified file is cached, a 
time value to provide a timeout for deleting old files, and optionally a file name. 
Details of the SHAl (US Secure Hash Algorithm 1) can be found in RFC (Request for 
Comments) RFC3174; details of message digest (MD) Amotions such as MD2 and MDS 
can be found on the website of RSA Data Security, Inc and in RFCs 1319 - 1321. 
Broadly speaking a hash function generates a fixed Iragth oulput fiom a variable length 
input, the output providing a representation of the input file or message. It is desirable 
that a hash fimction is collision resistant, that is ibat it is unlikely that two different 
input messages will result in the same output The MDS algorithm provides a 128bit 
(16byte) fingerprint or message digest of the input in such a way that it is extremely 
unlikely that two files with different contents will have the same message digest The 
SHAl algorithm is similar but produces a 20 byte output. 

The cache ID in the cache table links to one or more Source IDs for one or more remote 
sotirces (that is external to the cache), each having an IP address, port and remote 
machine index. The cache for queries/query hits is similar but includes the file name 
which is not needed for the file cache. 

Figure 6 illustrates messages flowing in a P2P network including gateway node as 
illustrated in figure 5, showing steps m locating and then downloading a file. The steps 
show messages flowing between a user node such as one of personal computers 410, the 
P2P gateway 408 and a remote P2P node such as a node within internet space 404 in 
figure 4b. 

Initially user node 410 issues a query 700 which is received by P2P gateway 408. If 
this query includes a hash value for a requested file the gateway may be able to respond 
inamediately with a query hit 702 specifying a location for the file, eiflier in the cache or 
in some other location, preferably within the ISP*s network. Likewise if P2P gateway 
408 includes a cache of query hits previously sent in response to queries, this cache 
including file names, then even if query 700 does not include a hash value P2P gateway 
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408 may be able to respond with a query hit 702 based upon a query hit stored in data 
store 608 in response to a previous similar query. 

If P2P gateway 408 is not able to or configured to respond directly to user node 410 
query is relayed in accordance with the P2P protocol to a remote P2P node. This 
broadcasting step may involve preferentially broadcasting to nodes within the ISP's 
network or otherwise limiting propagation of the query 700 outside the ISP's network. 
In accordance with the P2P protocol a query hit 704 in response to query 700 is relayed 
back to P2P gateway 408. Alternatively where P2P gateway 408 does not con^rise an 
active node on the network query hit 704 may nonetheless be sent via P2P gateway 408 
by one of routers 414,506. 

When query hit 704 is received by P2P gateway 408 the hash value in the query hit is 
read and if there is no entry for the hash value in the cache 608 the query hit is added to 
the cache and a corresponding cache ID is created, thus linking the source of the query 
hit, cache ID and hash value. Optionally the file name may also be included in the 
cache. Storing query hits in this way facilitates reducing P2P network search trafSc. 

If on reading the hash value in query hit 704 it is determined a cached version of a 
requested file exists within data store 608 then the cache ID of the requested file is 
substituted for the index in the query hit. This is repeated for all the cached files 
identified in the query hit. Optionally the IP address of the gateway may be substituted 
for the source address. The port number of the query hit may also be modified to a 
known or assigned P2P port number to facilitate subsequent identification of a packet as 
a P2P data packet (particularly where the P2P protocol employs dynamic port 
allocation). 

The query hit 704 is then relayed back to user node 410 which subsequently issues an 
HTTP GET request to download the file. Although this request is sent to the address 
and port specified in the query hit the request is intercepted by tihie P2P gateway 408. 
Then using either the hash value (present in the GET request) or the index as a cache ID 
the gateway 408 checks whether the requested file is stored in the cache. If the 
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requested is cached the gateway node 408 responds immediately with the requested file 
708 but if the file is not cached the gateway selects a source for the requested file using 
the source table of data store 608. (Where no source is listed in the cache the gateway 
may use the source indicated by the requesting user node 410 in the GET request) the 
gateway 408 then issues its own GET request 710, selecting a source which maybe the 
remote P2P node which responded with query hit 704 or which may comprise an 
alternative remote P2P node» for exan^le a P2P node on a network which it is cheaper 
for the ISP to access or more preferably a P2P node within the ISP*s own network. The 
file 712 is then retrieved fix)m this P2P node, stored within the cache, and then served 
708 to the requesting user node 410. 

Figures 8 and 9 show flow diagrams illustrating in more detail the procedure described 
above in reference to figure 7 described above. 

Figure 8 illustrates the operation of an embodiment of query/queryhit handling code 
within gateway 408. Thus at step S800 the gateway 408 receives a P2P query and, at 
step S802 and checks whether it is able to respond fix>m the cache. If gateway 408 is 
able to respond &om the cache it does so at step S804 and the procedure then ends. 
However if the gateway is unable to respond directly it relays the query to its intended 
destination at step S806 and then waits to receive a query hit response including one or 
more file hash values at step S808. The gateway node then checks, at step S8 10, 
whether the one or more hash values received at step S808 are in the cache table of data 
source 608. 

If there is no entry for hash value within the cache table then, at step S816, the gateway 
node assigns a free cache ID to the hash value and adds a new record to the cache table 
comprising the assigned cache ID, the hash value and a time/date stamp. If the hash 
value is present then, at step SB 12, the corresponding cache ID is read firom the cache 
table and, at step S8 14, the query hit is rewritten with the cache ID as the index and, 
optionally, with a standard P2P protocol port number to facilitate later P2P packet 
processing. Althougji it is not necessary the gateway also rewrites the IP address in the 
query hit to point to the gateway. 
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The procedure then updates the cache source table record usmg the hash value and/or 
cache ID index (S818) to add a new source table record comprismg a source ID, cache 
ID, IP address, port and remote (machine) index. Optionally at step S820 a search 
cache cornprising corresponding information but also including a file name (to facilitate 
responding to queries witiiout hash values) may also be updated at step 820. Then at 
step S822 the rewritten query hit is sent back to the requester and the procedure ends at 
step S824. 

Figure 9 diows a flow diagram of a procedure for handling a user download request 
received firom a node such as user node 410. 

At step S900 the gateway node 408 receives a user download (GET) request and at step 
S902 checks whether the request includes a hash value. If the request does include a 
hash value this is read, at step S904 for accessing the cache tables; if not the rewritten 
index, that is the cache ID, is extracted fi-om the file request for accessing the cache 
table (step S906). Then, at step S908 this procedure checks whether the request is filed 
within the cache and if so serves the file to the requesting node from the cache (step 
S910) and the procedure then ends at step S912. If the requested file is not in the cache 
the procedure checks at step S914, whether or not a source identifier for the requested 
file is stored in the cache. If so, at step S916, the available sources for the file are read 
from the cache and one is selected, either randomly or based upon a cost (monetary or 
otherwise). For example an internet service provider may have an arrangement with a 
cable company to provide access to users connected to that cable company network at 
reduced rates compared with other upstream access from the ISP*s network. If, at step 
S914, there is no source for the file identified in the cache the intended destination of 
the downloaded request is selected as the source (step S918), for example from the 
"envelope" of the download request Then, at step S920, the gateway connects to the 
identified source and downloads the requested file, saving it in the cache and updating 
the InOurCache flag (S922) and sending the file to the requester (S924) the procedure 
then ending at step S926. It will be appreciated that even when the gateway node is 
unable to read the network search traffic, for example because it is encrypted, where the 
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download request includes a hash value a file may be served to the requester from the 
cache to significantly reduce at least download traffic on the P2P network. It will 
further be appreciated that a requesting user may obtain so called FastTrack file access 
by simply sending a GET result including the hash value for the desired file to the 
gateway (since the hash fimction algorithms are widely known hash values for 
commonly accessed files may be readily Usted). 

No doubt many other effective alternatives will occur to the skilled person. For 
example although specific embodunents of die invention have been described with 
reference to P2P networks operating over TCP/IP, the principals described above may 
be applied to P2P networks operating over other protocols (e.g. UDP), and in other 
environments such as, for example, mobile communications systems, wireless computer 
networks and alike. 

The invention encon:q)asses modifications apparent to those skilled in the art lying 
within the scope of the amended claims. 
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CLAIMS: 

1 . A method of reducing traffic in a decentralised peer-to-peer network, said peer- 
to-peer network operating over an underlying netwoik comprising first and second 
network portions, the method comprising: 

routing a peer-to-peer message in one of said netwoik portions with an intraded 
destination in the other of said network portions to a gateway between peer-to-peer 
nodes residing on said first and second network portions; and 

controlling transport of said message at said gateway to limit propagation of said 
message into said other of said network portions. 

2. A method as claimed in claim 1 wherein said first network portion comprises a 
portion of said underlying network managed by a first entity and said second network 
portion comprises a portion of said underlying network coimected to said first network 
portion across a boundary. 

3. A method as claimed in claim 2 implemented to limit a number of peer-to-peer 
connections across said boundary to a permitted maximum. 

4. A method as claimed in claim 1, 2 or 3 wherein said transport controlling 
comprises blocking said message at said gateway. 

5. A method as claimed in claim 1, 2 or 3 wherein said transport controlling 
comprises redirecting said message to a peer-to-peer node within said one of said 
network portions. 

6. A method as claimed in claim 1 , 2 or 3 wherein said transport controlling 
comprises responding to said message fix)m said gateway. 
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7. A method as claimed in claim 6 wherein said message comprises a query, and 
wherein said responding comprises sending a response to said query comprising cached 
data derived from previous response to queries. 

8. A method as claimed in claim 6 wherein said message comprises a file request, 
and wherein said responding comprises sending a response to said file request 
comprising previously cached data for a requested file. 

9. A method as claimed in claim 1 or 2 wherein said message comprises a file 
request message, and wherein said controlling comprises modifying a response to a 
previous file search request such that said response does not indicate that a requested 
file may be found in said other of said network portions. 

10. A method as claimed in claim 9 wherem a said requested file is identified by a 
hash value. 

11. A method as claimed in claim 9 or 1 0 fiirther comprising storing requested files 
in a cache, and wherein said response is modified to refer to said cache. 

12. A method as claimed in claim 9 or 10 wherein said underlying network 
comprises a third network portion, and wherein said modifying comprises modifying 
said response to indicate that said requested file is obtainable from a peer-to-peer node 
located on said third network portion. 

13. A method as claimed in claim 1, 2, 3 wherein said physical network comprises a 
third network portion, wherein use of each of said network portions has an associated 
cost, wherein data transport over said third network portion has a cost less than a cost 
associated with said other of said network portions, and wherein said controlling 
comprises directing said message into said third network portion. 



14. A method as claimed in claim 1 or 2 wherein a said peer-to-peer message has a 
message identifier, and wherein said controlling comprises: 
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storing said message identifier for said message, 

monitoring message identifiers of messages passing through said gateway, and 

luniting propagation of said identified message such tiiat said message passes 
between said first and second network portions no more than a pemiitted maximimi 
number of times. 

15. A method as claimed in claim 14 wherein said permitted nnavfmiim number of 
times is one. 

16. A method as claimed in any preceding claim wherein said network portions 
comprise domains of an internet. 

17. A method as claimed in any preceding claim wherein said one of said network 
portions comprises said first network portion and said other of said network portions 
comprises said second network portion. 

1 8. A computer network message controller for reducing trafBc in a decentralised 
peer-to-peer network, said peer-to-peer network operating over a physical network 
comprising first and second network portions, said network message controller 
comprising: 

a router for routing a peer-to-peer message in one of said first network portions 
with an intended destination in the other of said network portions to a gateway between 
peer-to-peer nodes residing on said first and second networic portions; and 

a gateway controll^ configured to control transport of said message into said 
other of said network portions 

19. A computer network message controller as claimed in claim 1 8 wherein said 
first network portion comprises a portion of said physical network managed by a first 
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entity and said second network portion comprises a portion of said physical network 
connected to said first network portion aooss a boundary. 

20. A computer network message controller as claimed in claim 1 9 wherein said 
gateway controller is configured to wherein said gateway controller is configured to 
limit a number of peer-to-peer connections across said boundary to a permitted 
maximum. 

21. A compute network message controller as claimed in claim 1 8, 19 or 20 
wherein said gateway controller is configured to block said message at said gateway. 

22. A computer network message controller as claimed in claim 1 8, 19 or 20 
wherein said gateway controller is configured to redirect said message to a peer-to-peer 
node within said one of said network portions. 

23. A computer network message controller as claimed in claim 1 8, 19 or 20 
wherein said gateway controller is configured to respond to said message, 

24. A computer network message controller as claimed in claim 23 further 
comprising a cache to store data, wherein said message comprises a query, and wherein 
said gateway controller is configured to send a response to said query including data 
fi'om said cache. 

25. A computer network message controller as claimed in claim 23 wherein said 
message comprises a file request, further comprising a cache to store data derived fix)m 
previous responses to file requests, and wherein said gateway controller is configured to 
send a response to said file request including data fi-om said cache. 

26. A computer network message controller as claimed in claim 18 or 19 wherein 
said message comprises a file request message, and wherein said gateway controller is 
configured to modify a response to a previous file search request such that said response 
does not indicate that a requested file may be found in said other of said network 
portions. 
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27. A computer network message controller as claimed in claim 26 wherein a said 
requested jBile is identified by a hash value. 

28. A computer network message as claimed in claim 26 or 27 further comprising a 
cache for storing requested files, and where said gateway controller is configured to 
modify said response to refer to said cache. 

29. A computer netwoik message as claimed in claim 1 8, 19 or 20 wherein said 
underlying networic comprises a third network portion, and wherein said gateway 
controller is configured to modify said response to indicate that said requested file is 
obtainable &om a peer-to-peer node located on said third networic portion. 

30. A computer network message controller as claimed m claim 18 or 19 wherein a 
said peer-to-peer message has a message identifier, and wherein said gateway controller 
is configured to store said message identifier for said message, monitor message 
identifiers of messages passing through said gateway, and limit propagation of said 
identified message such that said message passes between said first and second network 
portions no more than a permitted maximum number of times. 

31. A computer network message controller as claimed in claim 30 wherein said 
permitted maximum number of times is one. 

32. A computer network message controller as claimed in any one of claims 1 8 to 
3 1 wherein said one of said network portion comprises said first network portion and 
said other of said network portions and said other of said network portions comprises 
said second network portion, and wherein said router and said gateway controller 
comprise part of said first network portion. 

33. A computer network message controller as claimed in any one of claim 1 8 to 32 
wherein said one of said netwoik portions comprises said first network portion and said 
other of said network portions comprises said second network portioiL 
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34. A computer network message controller as claimed in any one of claims 1 8 to 
33 wherein said gateway controller comprises a processor, and program memory 
storing processor control code coupled to said processor to load and implement said 
code, said code comprising code to configure said gateway controller to operate as 
claimed in any one of claims 18 to 33. 

35. A carrier carrying the processor control code of claim 34. 

36. A gateway controller, in particular for the computer network message controller 
of claim 18, for reducing traffic in a decentralised peer-to-peer network operating over 
an und^lying network comprising first and second network portions, the controller 
being configured for operation at a gateway between peer-to-peer nodes residing on said 
first and second network portions, the gateway controller comprising: 

an interface for said first and second network portions, for receiving a peer-to- 
peer message in one of said first network portions with an intended destination in the 
other of said network portions; and 

a controller configured to control transport of said message into said other of 
said network portions. 

37. A gateway controller as claimed in claim 36 wherein said controller is 
configured to block said message at said gateway. 

38. A gateway controller as claimed in claim 36 or 37 wherein said controller is 
fiirther configured to redirect a said message to a peer-to-peer node within said one of 
said network portions. 

39. A gateway controller as claimed in claim 36, 37 to 38 wherein said controller is 
fiirther configured to respond to a said message. 
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40. A gateway controller as claimed in claim 39 comprising a query cache to store 
data derived from responses to queries, and wherein said controller is configured to 
respond to a said query using data from said query cache. 

41 . A gateway controller as claimed in claim 39 or 40 further comprising a file 
request cache to store data derived from responses to file requests, and wherein said 
controller is configured to respond to a said file request using data torn said file request 
cache. 

42. A gateway controller as claimed in any one of the claims 36 to 41 wherein said 
first and second network portions comprise physical portions of said underlying 
network. 

43. A gateway controller as claimed in claim 36 wherein said message comprises a 
file request message, and wherein said controller is configured to modify a response to a 
previous file search request such that said response does not indicate that a requested 
file may be found in said other of said network portions. 

44. A gateway controller as claimed in claim 43 wherein a said requested file is 
identified by a hash value. 

45. A gateway controller as claimed in claim 43 or 44 fiirther comprising a cache for 
storing requested files, and wherein said controller is configured to modify said 
response to refer to said cache. 

46. A gateway controller as claimed in claim 36 wherein said underlying network 
comprises a third network portion, and wherein said controller is configured to modify 
said response to indicate said requested file is obtainable from a peer-to-peer node 
located on said third netv^ork portion. 

f 

47. A gateway controller as claimed in claim 36 wherein a said peer-to-peer 
message has a message identifier, and wherein said controller is configured to store said 
message identifier for said message, monitor message identifiers of messages passing 
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through said gateway, and limit propagation of said identified message such that said 
message passes between said Gist and second network portions no more than a 
permitted maximum number of times. 

48. A gateway controller as claimed in claim 47 wherein said permitted maximum 
number of times is one. 

49. A gateway controller as claimed in claim 36 wherein said first network portion 
comprises a portion of said underlying network managed by a first entity and said 
second network portion comprises a portion of said underlying network connected to 
said first network portion across a boundary, and wherein said controller is configured 
to provide a limited number of peer-to-peer connections across said boundary. 

50. A gateway controller as claimed in any one of claims 36 to 49 wherein said one 
of said network portions comprises said first network portion and said other of said 
network portions comprises said second network portion. 

51. A gateway controller as claimed in any one of claims 36 to 50 wherein said 
network portions comprise domains of an internet. 

52. A gateway controller as claimed in any one of claims 36 to 51 wherein said 
controller comprises a processor, and program memory storing processor control code 
coupled to said processor to load and implement said code, said code comprising code 
to configure said controller to control transport of said message into said other of said 
network portions. 

53. A carrier carrying the processor control code of claim 52. 

54. A peer-to-peer network cache comprising: 

a network interface for interfacing to a network over which said peer-to-peer 
network operates; 
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a data store for storing cached data files each m association with a data file 
identifier, a data file identifier comprising a value computed fi-om the contents of a data 
file it identifies; 

program memory storing processor control code; and 

a processor coiq)led to said netwo± interface, to said data store, and to said 
program memory for implementing said processor control code, said code comprising 
code for controlling the processor to: 

read peer-to-peer traffic on said network; 

identify a request for a data file within said peer-to-peer traffic: 

identify the requested data file from a said data file identifier within said peer-to- 
peer traffic; and 

provide said requested data file fi-om said data store to a peer-to-peer node 
making said request. 

55. A peer-to-peer network cache as claimed in claim 53 wherein said request 
includes said data file identifier. 

56. A peer-to-peer network cache as claimed in claim 53 wherein said data file 
identifier is provided by a peer-to-peer node responding to said request. 

57. A peer-to-peer network cache as claimed in claim 54, 55 or 56 wherein said 
code further comprises code to: 

identify a response to said request within said peer-to-peer traffic; and 



modify said response to address a data file within said data store. 
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58. A peer-to-peer network cache as claimed in any one of claims 54 to 57 wherein 
said cache comprises an active node of said peer-to-peer network. 

59. A peer-to-peer network cache as claimed in any one of claims 54 to 58 wherein 
said code further comprises code to: 

obtain said requested data file &om said peer-to-pe^ network; and 

store said requested data file in said data store. 

60. A peer-to-peer network cache as claimed in claim 59 wherein said code fiirther 
comprises code to: 

identify diq)licate data files using said data file identifiers; and 

limit the number of duplicate data files stored in said data store. 

61. A peer-to-peer network cache as claimed in any one of claims 54 to 60 wherein 
said data file identifier comprises a has or checksum function. 

62. A pCCT-to-peer network cache as claimed in any one of claims 54 to 61 wherein 
said peer-to-peer network comprises a decentralised peer-to-peer network. 

63. A carrier carrying the processor control code of any one of claims 54 to 62. 

64. A method of reducing trafBc in a distributed peer-to-peer network, the method 
comprising: 

monitoring peer-to-peer traffic of said network; 

identifying a request for a data file within said peer-to-peer traffic; 
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identifying the requested data file from a data file identifier associated with said 
request within said peer-to-peer traffic; and 

providing said requested data file from a cache to a peer-to-peer node making 
said request. 

65* Computer program code to, when running, implement the method of claim 64. 

66. A carrier carrying the computer program code of claim 65. 

67. A peer-to-peer network cache for modifying peer-to-peer network traffic, the 
network cache comprising: 

a network interface for interfacing to a network over which said peer-to-peer 
network operates; 

a data store for storing digital fingerprint data for identifying data files, and 
corresponding data file source identifiers; and 

program memory storing processor control code; and 

a processor coupled to said network interface, to said data store and to said 
program memory for implementing said processor, control code, said code comprising 
code for controlling the processor to: 

reads peer-to-peer traffic on said networics; 

identify a response to a peer-to-peer file request within said peer-to-peer traffic, 
said response includmg a digital fingerprint of a requested file; 

identify from said data store an altemative source for said requested file to a 
source of said response; and 
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reply to said file request using said alternative source, 

68. A peer-to-peer network cache as claimed in claim 67 wherein said data store is 
further configured for storing said data file, and wherein said code further comprises 
code to modify said response to index a requested data file stored in said data store. 

69. A peer-to-peer network cache as claimed in claim 67 or 68 wherein said code 
further comprises code to: 

read said peer-to-peer traffic to identify digital fingerprint data and 
corresponding source identification data for data files; and 

store said digital fingerprint data and source identification data in said data store. 

70. A peer-to-peer network cache as claimed in claim 69 wherein said code further 
comprises code to: 

read said peer-to-peer data file data; and 

store said data file data in association with digital fingerprint data for said data 
file data in said data store. 

71 . A peer-to-peer network cache as claimed in any one of claims 67 to 70 wherein 
said digital fingerprint data comprises hash value data. 

72. A carrier carrying the processor control code of any one of claims 67 to 7 1 . 

73. A method of modifying peer-to-peer network traffic in a distributed peer-to-peer 
network, the method comprising: 

reading peer-to-peer traffic on said network; 



identifying a response to a peer-to-peer file request within said peer-to-peer 
traffic, said response including a digital fingerprint of a requested file; 
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identifying from a cache an alternative source for said requested file to a source 
of said response; and 

replying to said file request using said alternative source. 

74. A carrier carrying computer program code to, when running, modifying peer-to- 
peer network traffic in a distributed pe«:-to-peer network, by: ^ 

reading peer-to-peer traffic on said network; 

identifying a response to a peer-to-peer file request within said peer-to-peer 
traffic, said response including a digital fingerprint of a requested file; 

identifying fi^m a cache an alternative source for said requested file to a source 
of said response; and 



replying to said file request using said altemative source. 
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